Power et al.

mentions 1 type Person feed RSS

// recent coverage 1 mentions

03:10

2026-05-20

wanglun1996.github.io

large-language-models

Evals Will Break and You Won't See It Coming

Current evaluation methods for large language models (LLMs) are fundamentally reactive and fail to anticipate qualitative shifts in capabilities, such as emergent abilities or strategic information wi…

// co-occurs with top 3 entities

Wei et al. 1 Liu et al. 1 Schaeffer et al. 1